78 research outputs found
Dynamic Appearance: A Video Representation for Action Recognition with Joint Training
Static appearance of video may impede the ability of a deep neural network to
learn motion-relevant features in video action recognition. In this paper, we
introduce a new concept, Dynamic Appearance (DA), summarizing the appearance
information relating to movement in a video while filtering out the static
information considered unrelated to motion. We consider distilling the dynamic
appearance from raw video data as a means of efficient video understanding. To
this end, we propose the Pixel-Wise Temporal Projection (PWTP), which projects
the static appearance of a video into a subspace within its original vector
space, while the dynamic appearance is encoded in the projection residual
describing a special motion pattern. Moreover, we integrate the PWTP module
with a CNN or Transformer into an end-to-end training framework, which is
optimized by utilizing multi-objective optimization algorithms. We provide
extensive experimental results on four action recognition benchmarks:
Kinetics400, Something-Something V1, UCF101 and HMDB51
Learning an evolved mixture model for task-free continual learning
Recently, continual learning (CL) has gained significant interest because it
enables deep learning models to acquire new knowledge without forgetting
previously learnt information. However, most existing works require knowing the
task identities and boundaries, which is not realistic in a real context. In
this paper, we address a more challenging and realistic setting in CL, namely
the Task-Free Continual Learning (TFCL) in which a model is trained on
non-stationary data streams with no explicit task information. To address TFCL,
we introduce an evolved mixture model whose network architecture is dynamically
expanded to adapt to the data distribution shift. We implement this expansion
mechanism by evaluating the probability distance between the knowledge stored
in each mixture model component and the current memory buffer using the Hilbert
Schmidt Independence Criterion (HSIC). We further introduce two simple dropout
mechanisms to selectively remove stored examples in order to avoid memory
overload while preserving memory diversity. Empirical results demonstrate that
the proposed approach achieves excellent performance.Comment: Accepted by the 29th IEEE International Conference on Image
Processing (ICIP 2022
Binary morphological shape-based interpolation applied to 3-D tooth reconstruction
In this paper we propose an interpolation algorithm using a mathematical morphology morphing approach. The aim of this algorithm is to reconstruct the -dimensional object from a group of (n-1)-dimensional sets representing sections of that object. The morphing transformation modifies pairs of consecutive sets such that they approach in shape and size. The interpolated set is achieved when the two consecutive sets are made idempotent by the morphing transformation. We prove the convergence of the morphological morphing. The entire object is modeled by successively interpolating a certain number of intermediary sets between each two consecutive given sets. We apply the interpolation algorithm for 3-D tooth reconstruction
Masked Image Residual Learning for Scaling Deeper Vision Transformers
Deeper Vision Transformers (ViTs) are more challenging to train. We expose a
degradation problem in deeper layers of ViT when using masked image modeling
(MIM) for pre-training. To ease the training of deeper ViTs, we introduce a
self-supervised learning framework called Masked Image Residual Learning
(MIRL), which significantly alleviates the degradation problem, making scaling
ViT along depth a promising direction for performance upgrade. We reformulate
the pre-training objective for deeper layers of ViT as learning to recover the
residual of the masked image. We provide extensive empirical evidence showing
that deeper ViTs can be effectively optimized using MIRL and easily gain
accuracy from increased depth. With the same level of computational complexity
as ViT-Base and ViT-Large, we instantiate 4.5 and 2 deeper
ViTs, dubbed ViT-S-54 and ViT-B-48. The deeper ViT-S-54, costing 3 less
than ViT-Large, achieves performance on par with ViT-Large. ViT-B-48 achieves
86.2% top-1 accuracy on ImageNet. On one hand, deeper ViTs pre-trained with
MIRL exhibit excellent generalization capabilities on downstream tasks, such as
object detection and semantic segmentation. On the other hand, MIRL
demonstrates high pre-training efficiency. With less pre-training time, MIRL
yields competitive performance compared to other approaches
Defining Image Memorability using the Visual Memory Schema
Memorability of an image is a characteristic determined by the human observers’ ability to remember images they have seen. Yet recent work on image memorability defines it as an intrinsic property that can be obtained independent of the observer. The current study aims to enhance our understanding and prediction of image memorability, improving upon existing approaches by incorporating the properties of cumulative human annotations. We propose a new concept called the Visual Memory Schema (VMS) referring to an organization of image components human observers share when encoding and recognizing images. The concept of VMS is operationalised by asking human observers to define memorable regions of images they were asked to remember during an episodic memory test. We then statistically assess the consistency of VMSs across observers for either correctly or incorrectly recognised images. The associations of the VMSs with eye fixations and saliency are analysed separately as well. Lastly, we adapt various deep learning architectures for the reconstruction and prediction of memorable regions in images and analyse the results when using transfer learning at the outputs of different convolutional network layers
Steganalysis of 3D objects using statistics of local feature sets
3D steganalysis aims to identify subtle invisible changes produced in graphical objects through digital watermarking or steganography. Sets of statistical representations of 3D features, extracted from both cover and stego 3D mesh objects, are used as inputs into machine learning classifiers in order to decide whether any information was hidden in the given graphical object. The features proposed in this paper include those representing the local object curvature, vertex normals, the local geometry representation in the spherical coordinate system. The effectiveness of these features is tested in various combinations with other features used for 3D steganalysis. The relevance of each feature for 3D steganalysis is assessed using the Pearson correlation coefficient. Six different 3D watermarking and steganographic methods are used for creating the stego-objects used in the evaluation study
Enhancing reliability and efficiency for real-time robust adaptive steganography using cyclic redundancy check codes
The development of multimedia and deep learning technology bring new challenges to steganography and steganalysis techniques. Meanwhile, robust steganography, as a class of new techniques aiming to solve the problem of covert communication under lossy channels, has become a new research hotspot in the field of information hiding. To improve the communication reliability and efficiency for current real-time robust steganography methods, a concatenated code, composed of Syndrome–Trellis codes (STC) and cyclic redundancy check (CRC) codes, is proposed in this paper. The enhanced robust adaptive steganography framework proposed is this paper is characterized by a strong error detection capability, high coding efficiency, and low embedding costs. On this basis, three adaptive steganographic methods resisting JPEG compression and detection are proposed. Then, the fault tolerance of the proposed steganography methods is analyzed using the residual model of JPEG compression, thus obtaining the appropriate coding parameters. Experimental results show that the proposed methods have a significantly stronger robustness against compression, and are more difficult to be detected by statistical based steganalytic methods
watermarking of 3D shapes using localized constraints
This paper develops a digital watermarking methodology for 3-D graphical objects defined by polygonal meshes. In watermarking or fingerprinting the aim is to embed a code in a given media without producing identifiable changes to it. One should be able to retrieve the embedded information even after the shape had suffered various modifications. Two blind watermarking techniques applying perturbations onto the local geometry for selected vertices are described in this paper. The proposed methods produce localized changes of vertex locations that do not alter the mesh topology. A study of the effects caused by vertex location modification is provided for a general class of surfaces. The robustness of the proposed algorithms is tested at noise perturbation and object cropping.
- …